Skip to content

feat/add-support-qwen3#17

Open
fengzz-coding wants to merge 10 commits intomainfrom
feat/add-support-qwen3
Open

feat/add-support-qwen3#17
fengzz-coding wants to merge 10 commits intomainfrom
feat/add-support-qwen3

Conversation

@fengzz-coding
Copy link
Contributor

No description provided.

@fengzz-coding fengzz-coding requested a review from nickcom007 March 2, 2026 13:35
@fengzz-coding fengzz-coding requested review from nickcom007 and removed request for nickcom007 March 12, 2026 04:05
role in ["user", "assistant", "function_call"]
and content
):
if not content:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Reference extraction can be misaligned with the actual prompt

You extract the reference using:

last_msg = conversations[-1]

but you truncate the processed messages with:

conversation_to_process = conversation_to_process[:-1]

Problem:

  • conversations = raw input
  • conversation_to_process = filtered + transformed version

If any messages were:

  • skipped (empty content),
  • transformed (e.g. function_callassistant),
  • or failed parsing,

then the “last raw message” may not match the “last processed message”.

You may remove one message from the prompt, but use a different one as the reference.


2. function_call reference is raw JSON string (may not match your evaluation goal)

When the last message is a function_call, you do:

reference_response = last_msg["content"]

This gives you something like:

{"name":"get_weather","arguments":{"city":"Toronto"}}

So your reference is:

  • a raw JSON string, not
  • a structured tool call, nor
  • a natural language answer

This is only correct if your evaluation expects:

  • exact string match of the function call JSON

Otherwise it may be inconsistent with:

  • how your template represents tool calls (tool_calls structure)
  • or how your model outputs them

3. Tool call ↔ observation matching is simplified (not robust)

You assign each observation to the most recent tool call:

for prev_msg in reversed(conversation_to_process):
    if prev_msg.get("role") == "assistant" and prev_msg.get("tool_calls"):
        tool_call_id = prev_msg["tool_calls"][0]["id"]
        break

This assumes:

  • one tool call at a time
  • one observation per call
  • strictly sequential flow

Works fine for simple ReAct-style traces like:

assistant → tool_call
tool → observation
assistant → next step

But breaks or becomes ambiguous if:

  • multiple tool calls in one assistant message
  • multiple observations
  • parallel or interleaved calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants